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Abstract 

We present a new oblivious RAM that supports variable-sized storage blocks (vORAM), which is 
the first ORAM to allow varying block sizes without trivial padding. We also present a new history- 
independent data structure (a HIRE tree) that can be stored within a vORAM. Together, this construction 
provides an efficient and practical oblivious data structure (ODS) for a key/value map, and goes further to 
provide an additional privacy guarantee as compared to prior ODS maps: even upon client compromise, 
deleted data and the history of old operations remain hidden to the attacker. We implement and measure 
the performance of our system using Amazon Web Services, and the single-operation time for a realistic 
database (up to 2^® entries) is less than 1 second. This represents a lOOx speed-up compared to the current 
best oblivious map data structure (which provides neither secure deletion nor history independence) by 
Wang et al. (CCS 14). 


1 Introduction 

1.1 Motivation 

Increasingly, organizations and individuals are storing large amounts of data in remote, shared cloud servers. 
For sensitive data, it is important to protect the privacy not only of the data itself but also of the access to 
the metadata that may contain which records have been accessed and when, thereby revealing properties 
of the underlying data, even if that data is encrypted. There are multiple points of potential information 
leakage in this setting: an adversary could observe network communication between the client and server; 
an adversary could compromise the cloud itself, observing the data stored at the server, possibly including 
mirrored copies or backups; an adversary could observe the computations performed by the remote server; 
the adversary may compromise the locally-stored client data; or, finally, the adversary may compromise the 
data in multiple ways, e.g., a complete compromise of both the remotely stored cloud storage and locally- 
stored client storage ^ 

While a complete compromise will inevitably reveal private data, we seek data storage mechanisms 
which maximize privacy while maintaining reasonable, practical efficiency, at any level of compromise. For 
generality, we assume a computationally-limited server which may only store and retrieve blocks of raw 
data, and we focus on the most basic (and perhaps most important) data structure: a key/value map. 

Oblivious RAM (ORAM). With a computationally-limited server, the access pattern of client-server com¬ 
munication reveals the entire history of the remote data store. This access pattern, even if the actual data is 

*We assume an honest-but-curious server throughout, and leave achieving an ODS with malicious servers as an open problem. 
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encrypted, may leak sensitive information about the underlying stored data, such as keyword search queries 
or encryption keys [12, 20, 42]. 

A generic solution to protect against access pattern leakage is oblivious RAM (ORAM) [13], which 
obscures the operation being performed (read/write), the address on which it operates, and the contents of 
the underlying data. Any program (with the possible necessity of some padding) can be executed using an 
ORAM to hide the access patterns to the underlying data. 

A great number of ORAM schemes have been recently proposed, most aiming to improve the efficiency 
as it relates to the recursive index structure, which is typically required to store the hidden locations of items 
within the ORAM (for example [11, 16, 21, 24, 36, 37] and references therein). However, an important 
aspect overlooked by previous work is the size of data items themselves. The vORAM construction we 
propose provides an affirmative answer to the following question: 

Can an oblivious RAM hide the size of varying-sized items, with greater efficiency than that 

achieved by trivial padding? 

Oblivious data structure (ODS). Recently, Wang et al. [40] showed that it is possible to provide oblivious¬ 
ness more efficiently if the specifics of the target program are considered. In particular, among other results, 
Wang et al. achieved an oblivious data structure (ODS) scheme for a key-value map, by constructing an AVL 
tree on a non-recursive ORAM without using the position map. Their scheme requires O(logre) ORAM 
blocks of client storage, where n is the maximum number of allowable data items. More importantly, due 
to lack of position map lookups, the scheme requires only 0(log^ n) blocks of communication bandwidth, 
which constituted roughly an O (log n) -multiplicative improvement in communication bandwidth over the 
generic ORAM solution. We will briefly explain “the pointer-based technique” they introduced to eliminate 
the position map in Section 1.3. 

The practicality of oblivious data structures are challenging, however, owing to the combination of 
inefficiencies in the data structures compounded with that of the underlying ORAM. In our experimental 
results presented in Section 6, and Table 1 specifically, we found that the AVL ODS suffers greatly from a 
high round complexity, and also that the per-operation bandwidth exceeds the total database size (and hence 
a trivial alternative implementation) until the number of entries exceeds 1 million. 

Similar observations for ORAMs more generally were made recently by Bindschaedler et al. [4], who 
examined existing ORAM alternatives in a realistic cloud setting, and found many theoretical results lacking 
in practice. We ask a related question for ODS, and answer it in the affirmative with our HIRE data structure 
stored in vORAM: 

Can an oblivious map data structure be made practically useful in the cloud setting? 

Catastrophic attack. In the cloud storage scenario, obliviousness will protect the client’s privacy from any 
observer of network traffic or from the cloud server itself. However, if the attacker compromises the client 
and obtains critical information such as the encryption keys used in the ODS, all the sensitive information 
stored in the cloud will simply be revealed to the attacker. 

We call this scenario a catastrophic attack, and it is important to stress that this attack is quite realistic. 
The client machine may be stolen or hacked, or it may even be legally seized due to a subpoena. 

Considering the increasing incidence of high-profile catastrophic attacks in practice (e.g., [1, 19]), and 
that even government agencies such the CIA are turning to third-party cloud storage providers [23], it is 
important to provide some level of privacy in this attack scenario. Given this reality, we ask and answer the 
following additional question: 

Can we provide any privacy guarantee even under a catastrophic attack? 
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Specifically, our vORAM+HIRB construction will provide strong security for deleted data, as well as a 
weaker (yet optimal) security for the history of past operations, after complete client compromise. 

1.2 Security Requirements 

Motivated by the goals outlined previously, we aim to construct a cloud database system that provides the 
following two security properties: 

• Obliviousness: The system should hide both the data and the access patterns from an observer of all 
client-server communication (i.e., be an ODS). 

• Secure Deletion and History Independence: The system, in the face of a catastrophic attack, should 
ensure that no previously deleted data, the fact that previous data existed, or the order in which extant 
data has been accessed, is revealed to an attacker. 

Additionally, we require that the system be practically useful, meaning it should be more efficient (w.r.t. com¬ 
munication cost, access time, and round complexity) than previous ODS schemes, even those that do not 
necessarily provide secure deletion nor history independence. 

Each required security notion has individually been the focus of numerous recent research efforts (see 
Section 2). To the best of our knowledge, however, there is no previous work that considers all the properties 
simultaneously. We aim at combining the security properties from obliviousness, secure deletion, and his¬ 
tory independence into a new, unified system for secure remote cloud storage. The previous ODS schemes 
do not provide history-independence nor secure deletion and are inefficient for even small data stores. Pre¬ 
vious mechanisms providing secure deletion or history independence are more efficient, but do not hide the 
access pattern in remote cloud storage (i.e., do not provide obliviousness). And unfortunately, the specific 
requirements of these constructions means they cannot trivially be combined in a straightforward way. 

To better understand the necessity of each of the security requirements, consider each in kind. 

Obliviousness: The network traffic to a remote server reveals to an attacker, or to the server itself, which raw 
blocks are being read and written. Even if the block contents are encrypted, an attacker may be able 
to infer sensitive information from this access pattern itself. Eike previous ODS schemes, our system 
will ensure this is not the case; the server-level access pattern reveals nothing about the underlying 
data operations that the user is performing. 

History independence: By inspecting the internal structure of the currently existing data in the cloud after 
a catastrophic attack, the attacker may still be able to infer information about which items were re¬ 
cently accessed or the likely prior existence of a record even if that record was previously deleted [2]. 
However, if an ODS scheme provided perfect history independence, the catastrophic attacker cannot 
infer which sequence of operations was applied, among all the sequences that could have resulted in 
the current set of the data items. Interestingly, we show that it is impossible to achieve perfect history 
independence in our setting with a computationally-limited server; nonetheless, providing ^-history 
independence is still desirable, where only the most recent £ operations are revealed but nothing else. 

Secure deletion: Given that only bounded history independence is possible, the privacy of deleted data must 
be considered. It is desirable that the catastrophic attacker should not be able to guess information 
about deleted data. In practice, data deleted from persistent media, such as hard disk drives, is easily 
recoverable through standard forensic tools. In the cloud setting, the problem is compounded because 
there is normally no direct control of how and where data is stored on physical disks, or backed up and 
duplicated in servers around the globe. We follow a similar approach as [34], where secure deletion is 
accomplished by re-encrypting and deleting the old encryption key from local, erasable memory such 
as RAM. 
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1.3 Our Work 

Pointer-based technique. Wang et al. [40] designed an ODS scheme for map by storing an AVL tree on top 
of the non-recursive Path ORAM [37] using the pointer-based technique, in which the ORAM position tags 
act as pointers, and the pointer to each node in the AVL tree is stored in its parent node. With this technique, 
when the parent node is fetched, the position tags of its children are immediately obtained. Therefore, the 
position map lookups are no more necessary. 

Similarly, in our ODS scheme, we will overlay a data structure on a non-recursive ORAM using a 
pointer-based technique for building the data structure. 

We stress that the non-recursive Path ORAM still remains the best choice when we would like to embed 
our data structure in an ORAM with the pointer-based technique, in spite of all the recent improvements 
on ORAM techniques. This is mainly because all ORAM improvement techniques consider the setting 
where an ORAM runs in a stand-alone fashion, unlike our setting where the ORAM actions, in particular 
with position map lookups, depend on the upper-layer data structure. In particular, with the non-recursive 
Path ORAM, each ORAM operation takes only a single round of communication between the client and 
server, since there is no position map lookup; moreover, each operation transfers 0(log n) blocks where the 
size of each block can be arbitrarily small up to O(logn). To compare the non-recursive Path ORAM with 
the most recent stand-alone ORAMs, each operation of the constant communication ORAM [29] transfers 
0(1) blocks each of which should be of size O(log^n), and it additionally uses computation-intensive 
homomorphic encryptions. For Ring ORAM [35], it still refers to the position map, and although its online 
stage may be comparable to the non-recursive Path ORAM, it still has the additional offline stage. The non¬ 
recursive version of these ORAMs has essentially the same efficiency as the non-recursive Path ORAM. 

Impracticality of existing data structures. Unfortunately, no current data structure exists that can meet 
our security and efficiency requirements: 

• It should be a rooted tree. This is necessary, since we would like to use the pointer-based technique. 
Because the positions are randomly re-selected on any access to that node, the tree structure is impor¬ 
tant in order to avoid dangling references to old pointers. 

• The height of the tree should be O(logn) in the worst case. To achieve obliviousness, all operations 
must execute with the same running time, which implies all operations will be padded to some upper 
bound that is dependent on the height of the tree. 

• The data structure itself should be (strongly) history-independent, meaning the organization of nodes 
depends only on the current contents, and not the order of operations which led to the current state. As 
a negative example, consider an AVL tree, which is not history independent. Inserting the records A, 
B, C, D in that order; or B, C, D, A in that order; or A, B, C, D, E and then deleting E; will each result 
in a different state of the data structure, thereby revealing (under a catastrophic attack) information on 
the insertion order and previous deletions. 

To the best of our knowledge, there is no data structure satisfying all of the above conditions. Most 
tree-based solutions, including AVE trees and B-trees, are not history independent. Treaps and B-treaps are 
rooted trees with history independence, but they have linear height in the worst case. Skip-lists and B-Skip- 
lists are history independent and tree-like, but technically they are not rooted trees and thereby not amenable 
to the pointer-based technique. That is. Skip-lists and B-Skip-lists have multiple incoming links, requiring 
linear updates in the ORAM to maintain the pointers and position tags in the worst case. 

HIRB. We developed a new data structure, called a HIRB tree (history independent, randomized B-tree), 
that satisfies all the aforementioned requirements. Conceptually, it is Si fixed height B-tree such that when 
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each item is inserted, the level in HIRE tree is determined by log^ n trials of (pseudorandom) biased coin 
flipping where /3 is the block factor. The tree may split or merge depending on the situation, but it never 
rotates. The fixed height of the tree, i.e. H = I -\- log^ n, is very beneficial for efficiency. In particular, 
every operation visits at most 2H nodes, which greatly saves on padding costs, compared to the ODS scheme 
of [40] where each AVL tree operation must be padded up to visiting 3 • 1.44 • Ig n nodes. 

The HIRE is described more carefully in Section 5, with full details in the appendix. 

vORAM. One challenge with HIRE trees is that number of items that each tree node contains are variable, 
and in the unlucky case, it may become too large for an ORAM bucket to store. 

This challenge is overcome by introducing vORAM (ORAM with variable-size blocks). The design of 
vORAM is based on the non-recursive version of Path ORAM where the bucket size remains fixed, but each 
bucket may contain as many variable-size blocks (or parts of blocks) as the bucket space allows. Elocks 
may also be stored across multiple buckets (in the same path). 

We observe that the irregularity of the HIRE node sizes can be smoothed over 0(log n) buckets from the 
vORAM root to an vORAM leaf, and we prove that the stash size on the client can still be small O(logre) 
with high probability. We note that vORAM is the first ORAM that deals with variable size blocks, and may 
be of independent interest. 

The vORAM is described carefully in Section 4, and the full details are provided in the appendix. 

Secure deletion. Finally, for secure deletion, a parent vORAM bucket contains the encryption keys of 
both children. When a bucket is modified, it is encrypted with a fresh key; then the encryption keys in the 
parent is accordingly modified, which recursively affects all its ancestors. However, we stress that in each 
vORAM operation, leaf-to-root refreshing takes place anyway, and adding this mechanism is bandwidth- 
free. Additionally, instead of using the label of each item directly in HIRE, we use the hash of the label. 
This way, we can remove the dependency between the item location in HIRE and its label (with security 
proven in the random oracle model). 

Imperfect history independence. Our approach does not provide perfect history independence. Although 
the data structure in the vORAM is history independent, the vORAM is not. Indeed, in any tree-based or 
hierarchical ORAM, the items near the root have been more likely recently accessed as compared to items 
near the leaves. The catastrophic adversary can observe all the ORAM structure, and such leakage breaks 
perfect history independence. We show a formal lower bound for the amount of leakage in Section 3. 

Experiments and efficiency of our scheme. In order to empirically measure the performance of our 
construction, we first performed an analysis to determine the smallest constant factor overhead to achieve 
high performance with negligible likelihood of failure. Following this, we implemented our system in the 
cloud with Amazon Web Services as the cloud provider and compared it to alternatives that provide some, 
but not all of the desired security properties. To the best of our knowledge, there has been no previous work 
that implements and tests any ODS system in the actual cloud setting. As argued in Eindschaedler et al. [4], 
who independently compared various ORAM systems in the cloud, it is important to see how systems 
work in the actual intended setting. As comparison points, we compare our system with the following 
implementations: 

• ORAM-i-AVL: We reimplemented the ODS map by Wang et al. [40] that provides obliviousness but 
not secure deletion nor history independence. 

• SD-E-Tree: We implemented a remotely stored block-level, encrypted E-Tree (as recommend by 
the secure deletion community [34]) that provides secure deletion but not history independence nor 
obliviousness. 

• Naive approach: We implemented a naive approach that achieves all the security properties by trans¬ 
ferring and re-encrypting the entire database on each access. 
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In all cases the remotely stored B-Tree is the fastest as it requires the least amount of communication cost 
(no obliviousness). For similar reasons, vORAM+HIRB is much faster than the baseline as the number of 
items grows (starting from 2^^ items), since the baseline requires communication that is linear in the number 
of items. We also describe a number of optimizations (such as concurrent connections and caching) that 
enables vORAM+HIRB to be competitive with the baseline even when storing as few as 2® items. It should 
be noted, without optimizations, the access time is on the order of a few seconds, and with optimizations, 
access times are less than one second. 

Surprisingly, however, the vORAM+HIRB is 20x faster than ORAM+AVL, irrespective of the number of 
items, even though ORAM+AVL does not support history independence or secure deletion. We believe this 
is mainly because vORAM+HIRB requires much smaller round complexity. Two factors drive the round 
complexity improvement: 

Much smaller height: While each AVL tree node contains only one item, each HIRB node contains f3 items 
on average, and is able to take advantage of slightly larger buckets which optimize the bandwidth to 
remote cloud storage by storing the same amount of data in trees with smaller height. 

Much less padding: AVL tree operations sometimes get complicated with balancing and rotations, due to 
which each operation should be padded up to 3 • 1.44 Ign node accesses. However, HIRB operations 
are simple, do not require rotations, and thus, each operation accesses at most 2 log^ n nodes. 

Although the Path-ORAM bucket for ORAM+AVL is four times smaller than the vORAM bucket in our 
implementation, it affects bandwidth but not the round complexity. The fully optimized vORAM+HIRB 
protocol is about lOOx faster than ORAM+AVL. We describe the details of our experiments in Section 6. 

Summary of our contributions. To summarize, the contributions of this paper are: 

• New security definitions of history independence and secure deletion under a catastrophic attack. 

• The design and analysis of an oblivious RAM with variable size blocks, the vORAM; 

• The design and analysis of a new history independent and randomized data structure, the HIRB tree; 

• A lower bound on history independence for any ORAM construction with sub-linear bandwidth; 

• Improvements to the performance of mapped data structures stored in ORAMs; 

• An empirical measurement of the settings and performance of the vORAM in the actual cloud setting; 

• The implementation and measurement of the vORAM+HIRB system in the actual cloud setting. 


2 Related Work 

We discuss related work in oblivious data structures, history independence, and secure deletion. Our system 
builds upon these prior results and combines the security properties into a unified system. 

ORAM and oblivious data structures. ORAM protects the access pattern from an observer such that it 
is impossible to determine which operation is occurring, and on which item. The seminal work on the topic 
is by Goldreich and Ostrovsky [13], and since then, many works have focused on improving efficiency of 
ORAM in both the space, time, and communication cost complexities (for example [11, 16, 21, 24, 36, 37] 
just to name a few; see the references therein). 

There have been works addressing individual oblivious data structures to accomplish specific tasks, such 
as priority queues [39], stacks and queues [27], and graph algorithms [5]. Recently, Wang et al. [40] achieved 
oblivious data structures (ODS) for maps, priority queues, stacks, and queues much more efficiently than 
previous works or naive implementation of the data structures on top of ORAM. 

Our vORAM construction builds upon the non-recursive Path ORAM [40] and allows variable sized 
data items to be spread across multiple ORAM buckets. Although our original motivation was to store 
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differing-sized B-tree nodes from the HIRE, there may be wider applicability to any context where the size 
(as well as contents and access patterns) to data needs to be hidden. 

Interestingly, based on our experimental results, we believe the ability of vORAM to store partial blocks 
in each bucket may even improve the performance of ORAM when storing uniformly-sized items. However, 
we will not consider this further in the current investigation. 

History independence. History independence of data structures requires that the current organization of the 
data within the structure reveals nothing about the prior operations thereon. Micciancio [26] first considered 
history independence in the context of 2-3 trees, and the notions of history independence were formally 
developed in [8, 17, 30]. The notion of strong history independence [30] holds if for any two sequences 
of operations, the distributions of the memory representations are identical at all time-points that yield the 
same storage content. Moreover, a data structure is strongly history independent if and only if it has a 
unique representation [17]. There have been uniquely-represented constructions for hash functions [6, 31] 
and variants of a B-tree (a B-treap [14], and a B-skip-list [15]). We adopt the notion of unique representation 
for history independence when developing our history independent, randomized B-tree, or HIRE tree. 

We note that history independence of these data structures considers a setting where a single party runs 
some algorithms on a single storage medium, which doesn’t correctly capture the actual cloud setting where 
client and server have separate storage, execute protocols, and exchange messages to maintain the data 
structures. Therefore, we extend the existing history independence and give a new, augmented notion of 
history independence for the cloud setting with a catastrophic attack. 

Independently, the recent work of [3] also considers a limited notion of history independence, called A- 
history independence, parameterized with a function A that describes the leakage. Our definition of history 
independence has a similar notion, where the leakage function A captures the number of recent operations 
which may be revealed in a catastrophic attack. 

Secure deletion. Secure deletion means that data deleted cannot be recovered, even by the original owner. 
It has been studied in many contexts [33], but here we focus on the cloud setting, where the user has little or 
no control over the physical media or redundant duplication or backup copies of data. In particular, we build 
upon secure deletion techniques from the applied cryptography community. The approach is to encrypt all 
data stored in the cloud with encryption keys stored locally in erasable memory, so that deleting the keys 
will securely delete the remote data by rendering it non-decryptable. 

Boneh and Lipton [7] were the first to use encryption to securely remove files in a sysfem with backup 
tapes. The challenge since was to more effectively manage encrypted content and the processes of re¬ 
encryption and erasing decryption keys. For example, Di Crescenzo et al. [10] showed a more efficient 
method for secure deletion using a tree structure applied in the setting of a large non-erasable persistent 
medium and a small erasable medium. Several works considered secure deletion mechanisms for a version¬ 
ing file sysfem [32], an inverted index in a write-once-read-many compliance storage [28], and a B-tree (and 
generally a mangrove) [34]. 

3 Preliminaries 

We assume that readers are familiar with security notions of standard cryptographic primitives [22]. Let A 
denote the security parameter. 

Modeling data structures. Following the approach from the secure deletion literature, we use two stor¬ 
age types: erasable memory and persistent storage. Contents deleted from erasable memory are non- 
recoverable, while the contents in persistent storage cannot be fully erased. We assume the size of erasable 
memory is small while the persistent storage has a much larger capacity. This mimics the cloud computing 


7 


Pre-Print version 2015-11-23 


acco-(— I?.lnit(l^, n); 

a^^X>.o^W(); 
if h = 1: 

return y^ 2 (ST, a^, V.em); 
else 

return ^ 2 (ST, a^); 


EXPtU.^3(I?,A,n,6) 
acco •(— I?.lnit(l'’',n); 
do ^ 0); 

di ■<— Ai{l^, 1); 

(o^do.di, S)-ir- ^2(acco, do, di); 
a^^T>.{^do,di Nsdfe)(); 
return y^3(acco, ac^, P.em); 


Figure 1: Experiments for security definitions 


setting where cloud storage is large and persistent due to lack of user control, and local storage is more 
expensive but also controlled directly. 

We define a data structure P as a collection of data that supports initialization, insertion, deletion, and 
lookup, using both the erasable memory and the persistent storage. Each operation may be parameterized 
by some operands (e.g., lookup by a label). For a data structure V stored in this model, let P.em and V.ps 
denote the contents of the erasable memory and persistent storage, respectively. For example, an encrypted 
graph structure may be stored in D.ps while the decryption key resides in D.em. For an operation op on 
V, let acc-^D.op() denote executing the operation op on the data structure V where acc is the access 
pattern over the persistent storage during the operation. The access pattern to erasable memory is assumed 
to be hidden. For a sequence of operations o$ = (op^,... ,op^), let ac^-^P.o^O denote applying the 
operations on 2?, that is, acci-^P.opiO, ... , accm-^ 2?.op^(), with ac^ = (acci,..., accm). We note 
that the access pattern ac^ completely determines the state of persistent storage V.ps. 

Obliviousness and history independence. Obliviousness requires that the adversary without access to 
erasable memory cannot obtain any information about actual operations performed on data structure V 
other than the number of operations. This security notion is defined through an experiment obl-hi, given in 
Figure 1, where V, A, re, h, b denote a data structure, the security parameter, the maximum number of items 
V can contain, history independence, and the challenge choice. 

In the experiment, the adversary chooses two sequences of operations on the data structure and tries to 
guess which sequence was chosen by the experiment with the help of access patterns. The data structure 
provides obliviousness if every polynomial-time adversary has only a negligible advantage. 

Definition 1, For a data structure V, consider the experiment A, re, 0, b) with adversary A = 

{Ai,A 2 )- We call the adversary A admissible if Ai always outputs two sequences with the same number of 
operations storing at most re items. We define the advantage of the adversary A in this experiment as: 


Adv°^'(V,X,n) 


Pr[EXP^^''^'(P,A,re,0,0) = 1] 
-Pr[EXP^^'-^'(P,A,re,0,l) = 1] 


We say that V provides obliviousness if for any sufficiently large A, any re G poly{\), and any PPT admissi¬ 
ble adversary A we have Adv^^'(P, A, re) < negl(A). 

Now we define history independence. As we will see, perfect history independence is inherently at 
odds with obliviousness and sub-linear communication cost. Therefore, we define parameterized history 
independence instead that allows for a relaxation of the security requirement. The parameter determines 
the allowable leakage of recent history of operations. One can interpret a history-independent data structure 
with leakage of i operations as follows: Although the data structure may reveal some recent £ operations 
applied to itself, it does not reveal any information about older operations, except that the total sequence 
resulted in the current state of data storage. 
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The experiment in this case is equivalent to that for obliviousness, except that (1) the two sequences 
must result in the same state of the data structure at the end, (2) the last £ operations in both sequences must 
be identical, and (3) the adversary gets to view the local, erasable memory as well as the access pattern to 
persistent storage. 

Definition 2. For a data structure T>, consider the experiment EXP^''^'(P, A, n, 1, h) with adversary A = 
{Ai,A 2 )- Wh call the adversary A f'-admissible if Ai always outputs sequences o^^^^ and which 
have the same number of operations and result in the same set storing at most n data items, and the last £ 
operations of both are identical. We define the advantage of an adversary A in this experiment above as: 


Advj 4 (D, A, n) 


Pr[EXP^^'-h'(P,A,n,l,0) = l] 


We say that the data structure T> provides history independence with leakage of £ operations if for any 
sufficiently large A, any n G poly{X), and any PPT£-admissible adversary A, we have AdvJ^(P, A, re) < 
negl(A). 


Lower bound on history independence. Unfortunately, the history independence property is inherently 
at odds with the nature of oblivious RAM. The following lower bound demonstrates that there is a linear 
tradeoff between the amount of history independence and the communication bandwidth of any ORAM 
mechanism. 

Theorem 1. Any oblivious RAM storage system with a bandwidth of k bytes per access achieves at best 
history independence with leakage ofXl{n/k) operations in storing re blocks. 

The intuition behind the proof^ is that, in a catastrophic attack, an adversary can observe which persis¬ 
tent storage locations were recently accessed, and furthermore can decrypt the contents of those locations 
because they have the keys from erasable memory. This will inevitably reveal information to the attacker 
about the order and contents of recent accesses, up to the point at which all re elements have been touched 
by the ORAM and no further past information is recoverable. 

Admittedly this lower bound limits what may be achievable in terms of history independence. But still, 
leaking only a known maximum number of prior operations is better than (potentially) leaking all of them! 

Consider, by contrast, an AVL tree implemented within a standard ORAM as in prior work. Using 
the fact that AVL tree shapes reveal information about past operations, the adversary can come up with 
two sequences of operations such that (i) the first operations of each sequence result in a distinct AVL tree 
shape but the same data items, and (ii) the same read operations, as many as necessary, follow at the end. 
With the catastrophic attack, the adversary will simply observe the tree shape and make a correct guess. 
This argument holds for any data structure whose shape reveals information about past operations, which 
therefore have no upper bound on the amount of history leakage. 

Secure deletion. Perfect history independence implies secure deletion. However, the above lower bound 
shows that complete history independence will not be possible in our setting. So, we consider a comple¬ 
mentary security notion that requires strong security for the deleted data. Secure deletion is defined fhrough 
an experimenf sdel, given in Figure 1. In fhe experimenf, Ai chooses fwo dafa ifems do and di af random, 
based on which A 2 oufpufs S). Here, denofes a vector of operations confaining neifher do 

nor di, and S = (si, S 2 , • ■ • , Sm) is a monofonically increasing sequence. o^dQ,di d;, denofes injecting 
dfe info o^do.di according to S. In particular, “inserf dj,” is placed af position si; for example, if si is 5, fhis 
inserf operafion is placed righf before fhe 6fh operafion of o^^q Then, “look-up df' is placed af posifions 
S 2 ,..., Sm-i, and finally “delete df' af Sm- 

^Full proofs for the main theorems may be found in Appendix C. 
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Definition 3. For a data structure T>, consider the experiment with adversary 

A = {Ai,A 2 ,A^). We call the adversary A admissible if for any data item d that (resp., 

.4i(l^, l)j outputs, the probability that Ai outputs d is negligible in A, i.e., the output Ai forms a high- 
entropy distribution; moreover, the sequence of operations from A 2 must store at most n items. We define 
the advantage of A as: 


Adv^®'(P, A, n) 


Pr[EXP^^'(P,A,n,0) = 1] 
-Pr[EXP^d^'(P,A,n,l) = l] 


We say that the data structure T> provides secure deletion if for any sufficiently large A, any n G poly{X), 
and any PPT admissible adversary A, we have Adv^®'(2?, A, n) < negl(A). 

Note that our definition is stronger than just requiring that the adversary cannot recover the deleted 
item; for any two high entropy distributions chosen by the adversary, the adversary cannot tell from which 
distribution the deleted item was drawn. 


4 ORAM with variable-size blocks (vORAM) 

The design of vORAM is based on the non-recursive version of Path ORAM [37], but we are able to add 
more flexibility by allowing each ORAM bucket to contain as many variable-size blocks (or parts of blocks) 
as the bucket space allows. We will show that vORAM preserves obliviousness and maintains a small stash 
as long as the size of variable blocks can be bounded by a geometric probability distribution, which is the 
case for the HIRE that we intend to store within the vORAM. To support secure deletion, we also store 
encryption keys within each bucket for its two children, and these keys are re-generated on every access, 
similarly to other work on secure deletion [10, 34]. 

Parameters. The vORAM construction is governed by the following parameters: 

• The height T of the vORAM tree: The vORAM is represented as a complete binary tree of buckets 
with height T (the levels of the tree are numbered 0 to T), so the total number of buckets is 2^+^ — 1. 
T also controls the total number of allowable data blocks, which is 2^. 

• The bucket size Z: Each bucket has Z bits, and this Z must be at least some constant times the 
expected block size B for what will be stored in the vORAM. 

• The stash size parameter R: Blocks (or partial blocks) that overflow from the root bucket are stored 
temporarily in an additional memory bank in local storage called the stash, which can contain up to 
R ■ B bits. 

• Block collision parameter 7 : Each block will be assigned a random identifier id', these identifiers will 
all be distinct at every step with probability 1 — negl( 7 ). 

Bucket structure. Each bucket is split into two areas: header and data. See Eigure 2 for a pictorial 
description. The header area contains two encryption keys for the two child buckets. The data area contains 
a sequence of (possibly partial) blocks, each preceded by a unique identifier string and the block data length. 
The end of the data area is filled with 0 bits, if necessary, to pad up to the bucket size Z. 

Each idi uniquely identifies a block and also encodes the path of buckets along which the block should 
reside. Partial blocks share the same identifier with each length I indicating how many bytes of the block 
are stored in that bucket. Recovering the full block is accomplished by scanning from the stash along the 
path associated with id (see Eigure 3). We further require the first bit of each identifier to be always 1 in 
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Figure 2: A single vORAM bucket with i partial blocks. 


stash 



Figure 3: A sample vORAM state with partial blocks with ido, idi, id 2 , id^: Note that the partial blocks for ido are 
opportunistically filled up the vORAM from leaf to root and then remaining partial blocks are placed in the stash. 


order to differentiate between zero padding and the start of next identifier. Moreover, to avoid collisions in 
identifiers, the length of each identifier is extended to 2 r + 7 + 1 bits, where 7 is the collision parameter 
mentioned above. The most significant T + 1 bits of the identifier (including the fixed leading 1-bit) are 
used to match a block to a leaf, or equivalently, a path from root to leaf in the vORAM tree. 

vORAM operations. Our vORAM construction supports the following operations. 

• \nsert{blk) 1 —)■ id. Inserts the given block blk of data into the ORAM and returns a new, randomly- 
generated id to be used only once at a later time to retrieve the original contents. 

• remove(zd) 1 —)■ blk. Removes the block corresponding to id and returns the original data blk as a 
sequence of bytes. 

• update(fd, callback) 1 — id'^. Given id and a user-defined function callback, perform \n5ert{callback{remoye{id))) 
in a single step. 

Each vORAM operation involves two phases: 

1. evict(id). Decrypt and read the buckets along the path from the root to the leaf encoded in the identifier 
id, and remove all the partial blocks along the path, merging partial blocks that share an identifier, and 
storing them in the stash. 

2. writeback(z(i). Encrypt all blocks along the path encoded by id with new encryption keys and oppor¬ 
tunistically store any partial blocks from stash, dividing blocks as necessary, filling from the leaf to 
the root. 

An insert operation first evicts a randomly-chosen path, then inserts the new data item into the stash with 
a second randomly-chosen identifier, and finally writes back the originally-evicted path. A remove operation 
evicts the path specified by the identifier, then removes that item from the stash (which must have had all 
its partial blocks recombined along the evicted path), and finally writes back the evicted path without the 
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deleted item. The update operation evicts the path from the initial id, retrieves the block from stash, passes it 
to the callback function, re-inserts the result to the stash with a new random id'^, and finally calls writeback 
on the original id. A full pseudocode description of all these operations is provided in Appendix A. 

Security properties. For obliviousness, any insert, remove, update operation is computationally indistin¬ 
guishable based on its access pattern because the identifier of each block is used only once fo refrieve fhaf 
ifem and fhen immediafely discarded. Each remove or update trivially discards the identifier after reading 
the path, and each insert evicts buckets along a bogus, randomly chosen path before returning a fresh id'^ to 
be used as the new identifier for that block. 

Theorem 2. The vORAM provides obliviousness. 

Secure deletion is achieved via key management of buckets. Every evict and writeback will result in a 
path’s worth of buckets to be re-encrypted and re-keyed, including the root bucket. Buckets containing any 
removed data may persist, but the decryption keys are erased since the root bucket is re-encrypted, rendering 
the data unrecoverable. Similarly, recovering any previously deleted data reduces to acquiring the old-root 
key, which was securely deleted from local, erasable memory. 

However, each evict and writeback will disclose the vORAM path being accessed, which must be han¬ 
dled carefully to ensure no leakage occurs. Eortunately, identifiers (and therefore vORAM paths as well) are 
uniformly random, independent of the deleted data and revealing no information about them. 

Theorem 3. The vORAM provides secure deletion. 

Regarding history independence, although any removed items are unrecoverable, the height of each 
item in the vORAM tree, as well as the history of accesses to each vORAM tree bucket, may reveal some 
information about the order, or timing, of when each item was inserted. Intuitively, items appearing closer 
to the root level of the vORAM are more likely to have been inserted recently, and vice versa. However, if 
an item is inserted and then later has its path entirely evicted due to some other item’s insertion or removal, 
then any history information of the older item is essentially wiped out; it is as if that item had been removed 
and re-inserted. Because the identifiers used in each operation are chosen at random, after some 0{n log n) 
operations it is likely that every path in the vORAM has been evicted at least once. 

Theorem 4. The vORAM provides history independence with leakage of 0{n log n -|- An) operations. 

In fact, we can achieve asymptotically optimal leakage with only a constant-factor blowup in the band¬ 
width. Every vORAM operation involves reading and writing a single path. Additionally, after each op¬ 
eration, we can evict and then re-write a complete subtree of size Ign which contains {lgn)/2 — 1 leaf 
buckets in a deterministicly chosen dummy operation that simply reads the buckets into stash, then rewrites 
the buckets with no change in contents but allowing the blocks evicted from the dummy operation and those 
evicted from the access to all move between levels of the vORAM as usual. The number of nodes evicted 
will be less than 2 Ig n, to encompass the subtree itself as well as the path of buckets to the root of the 
subtree, and hence the total bandwidth for the operation remains O(logn). 

The benefit of this approach is that if these dummy subtree evictions are performed sequentially across 
the vORAM tree on each operation, any sequence of n/ Ig n operations is guaranteed to have evicted every 
bucket in the vORAM at least once. Hence this would achieve history independence with only 0{n/ log re) 
leakage, which matches the lower bound of Theorem 1 and is therefore optimal up to constant factors. 

Stash size. Our vORAM construction maintains a small stash as long as the size of variable blocks can 
be bounded by a geometric probability distribution, which is the case for the HIRB that we intend to store 
within the vORAM. 
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Theorem 5. Consider a vORAM with T levels, collision parameter 7 , storing at most n = 2^ blocks, 
where the length I of each block is chosen independently from a distribution such that E[/] = B and Pr[/ > 
mB] < 0.5”^. Then, if the bucket size Z satisfies Z > 20B, for any R > 1, and after any single access to 
the vORAM, we have 

Pr[|stos/i| > RB]<2S- (0.883)^. 

Note that the constants 28 and 0.883 are technical artifacts of the analysis, and do not matter except to 
say that 0.883 < 1 and thus the failure probability decreases exponentially with the size of stash. 

As a corollary, for a vORAM storing at most n blocks, the cloud storage requirement is AABn bits, and 
the bandwidth for each operation amounts to AOB Ig n bits. However, this is a theoretical upper bound, and 
our experiments in Section 6 show a smaller constants suffice, namely, setting Z = 6B and T = [Ign — 1] 
stabilizes the stash, so that the actual storage requirement and bandwidth per operation are 6Bn and 12B Ig n 
bits, respectively. 

Furthermore, to avoid failure due to stash overflow or collisions, the client storage R and collision 
parameter 7 should both grow slightly faster than log n, i.e., 7 G w(log n). 

5 HIRE Tree Data Structure 

We now use the vORAM construction described in the previous section to implement a data structure sup¬ 
porting the operations of a dictionary that maps labels to values. In this paper, we intentionally use the word 
“labels” rather than the word “keys” to distinguish from the encryption keys that are stored in the vORAM. 

Motivating the HIRE, Before describing the construction and properties of the history independent, ran¬ 
domized B-Tree (HIRB), we first wish to motivate the need for the HIRB as it relates to the security and 
efficiently requirements of storing it within the vORAM: 

• The data structure must be easily partitioned into blocks that have expected size bounded by a geo¬ 
metric distribution for vORAM storage. 

• The data structure must be pointer-based, and the structure of blocks and pointers must form a directed 
graph that is an arborescence, such that there exists at most one pointer to each block. This is because 
a non-recursive ORAM uses random identifiers for storage blocks, which must change on every read 
or write to that block. 

• The memory access pattern for an operation (e.g., get, set, or delete) must be bounded by a fix 
parameter to ensure obliviousness; otherwise the number of vORAM accesses could leak information 
about the data access. 

• Finally, the data structure must be uniquely represented such that the pointer structures and con¬ 
tents are determined only by the set of (label, value) pairs stored within, up to some randomization 
performed during initialization. Recall that strong history independence is provided via a unique 
representation, a sufficient and necessary condition [17] for the desired security property. 

In summary, we require a uniquely-represented, tree-based data structure with bounded height. While a 
variety of uniquely represented (or strongly history independent) data structures have been proposed in the 
literature [14, 30], we are not aware of any that satisfy all of the requisite properties. 

While some form of hash table might seem like an obvious choice, we note that such a structure would 
violate the second condition above; namely, it would be impossible to store a hash table within an ORAM 
without having a separate position map, incurring an extra logarithmic factor in the cost. As it turns out, our 
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HIRE tree does use hashing in order to support secure deletion, but this is only to sort the labels within the 
tree data structure. 

Overview of HIRE tree. The closest data structure to the HIRE is the E-Skip List [15]; unfortunately, a skip 
list does not form a tree. The HIRE is essentially equivalent to a E-Skip List after sorting labels according 
to a hash function and removing pointers between skip-nodes to impose a top-down tree structure. 

Recall that a typical E-tree consists of large nodes, each with an array of (label, value) pairs and child 
nodes. A E-tree node has branching factor of k, and we call it a fc-node, if the node contains k — 1 labels, 
k — 1 values, and k children (as in Figure 4). In a typical E-tree, the branching factor of each node is 
allowed to vary in some range [B -\- 1, 2B], where H is a fixed parameter of the construction that controls 
the maximum size of any single node. 



Figure 4: B-tree node with branching factor k 

HIRE tree nodes differ from typical E-tree nodes in two ways. First, instead of storing the label in 
the node a cryptographic hash^ of the label is stored. This is necessary to support secure deletion of 
vORAM-i-HIRE even when the nature of vORAM leaks some history of operations; namely, revealing which 
HIRE node an item was deleted from should not reveal the label that was deleted. 

The second difference from a normal E-tree node is that the branching factor of each node, rather 
than being limited to a fixed range, can fake any value A: G [1, oo). This branching facfor will observe a 
geomefric disfribufion for storage wifhin fhe vORAM. In particular, if will be a random variable X drawn 
independenfly from a geometric distribution with expected value /?, where /3 is a parameter of the HIRE tree 
construction. 

The height of a node in the HIRE tree is defined as fhe lengfh of fhe pafh from fhaf node fo a leaf 
node; all leaf nodes are the same distance to the root node for E-trees. The height of a new insertion of 
(label, value) in the HIRE is determined by a series of pseudorandom biased coin flips based on fhe hash of 
the label^. The distribution of selected heights for insertions uniquely determines the structure of the HIRE 
tree because the process is deterministic, and thus the HIRE is uniquely-represented. 

Parameters and preliminaries. Two parameters are fixed af inifializafion: fhe expected branching factor 
(3, and fhe height H. In addifion, fhroughouf this section we will write n as the maximum number of distinct 
labels that may be stored in the HIRE tree, and 7 as a parameter that affects the length of hash digests^. 

A HIRE tree node with branching factor k consists of A; — 1 label hashes. A: — 1 values, and k vORAM 
identifiers which represenf poinfers fo fhe child nodes. This is described in Figure 5 where hi indicafes 
Hash(labelj). 

Similar to fhe vORAM ifself, fhe lengfh of the hash function should be long enough to reduce the 
probability of collision below 2~"<, so define |Hash(label)| = max( 2 FAlg /3 -|- 7 , A), and define nodesizcfc fo 
be fhe size of a HIRE free node wifh branching facfor k, given as 

nodesizefc = {k -\- 1){2T -|- 7 -|- 1) -|- A:(|Hash(label)| -|- |value|), 

^ We need a random oracle for formal security. In practice, we used a SHAl initialized with a random string chosen when the 
HIRE tree is instantiated. 

"'Note that this choice of heights is more or less the same as the randomly-chosen node heights in a skip list. 

^The parameter 7 for HIRE and vORAM serves the same purpose in avoiding collisions in identifiers so for simplicity we 
assume they are the same 
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Figure 5: HIRE node with branching factor k. 


where we write |value] as an upper bound on the size of the largest value stored in the HIRE. (Recall that 
the size of each vORAM identifier is 2T + 7 + 1.) Each HIRE tree node will be stored as a single block in 
the vORAM, so that a HIRE node with branching factor k will ultimately be a vORAM block with length 
nodesizefc. 

As /3 reflects the expected branching factor of a node, it must be an integer greater than or equal to 1. 
This parameter controls the efficiency of the tree and should be chosen according to the size of vORAM 
buckets. In particular, using the results of Theorem 5 in the previous section, and the HIRE node size 
defined above, one would choose /3 according to the inequality 20 nodesize /3 < Z, where Z is the size of 
each vORAM bucket. According to our experimental results in Section 6 , the constant 20 may be reduced 
to 6 . 

The height H must be set so that H > log^ n; otherwise we risk the root node growing too large. We 
assume that H is fixed at all times, which is easily handled when an upper bound n is known a priori. 

HIRE tree operations. As previously described, the entries in a HIRE node are sorted by the hash of the 
labels, and the search path for a label is also according to the label hashes. A lookup operation for a label 
requires fetching each HIRE node along the search path from the vORAM and returning the matching value. 

Initially, an empty HIRE tree of height H is created, as shown in Figure 6 . Each node has a branching 
factor of 1 and contains only the single vORAM identifier of its child. 

'X 


\ H -\-1 nodes 



Figure 6 : Empty HIRE with height H. 

Modifying the HIRE with a set or delete operation on some label involves first computing the height 
of the label. The height is determined by sampling from a geometric distribution with probability (/3 — 
l)//3, which we derandomize by using a pseudorandom sequence based on Hash(label). The distribution 
guarantees that, in expectation, the number of items at height 0 (i.e., in the leaves) is the number of 

items at height 1 is and so on. 

Inserting or removing an element from the HIRE involves (respectively) splitting or merging nodes 
along the search path from the height of the item down to the leaf. This differs from a typical E-tree in that 
rather than inserting items at the leaf level and propagating up or down with splitting or merging, the HIRE 
tree requires that the heights of all items are fixed. As a result, insertions and deletions occur at the selected 
height within the tree according to the label hash. A demonstration of this process is provided in Figure 7. 

In a HIRE tree with height H, each get operation requires reading exactly 77 + 1 nodes from the 
vORAM, and each set or delete operation involves reading and writing at most 2H + 1 nodes. To support 
obliviousness, each operation will require exactly 277+1, accomplished by padding with “dummy” accesses 
so that every operation has an indistinguishable access pattern. 

One way of reading and updating the nodes along the search path would be to read all 277 + 1 HIRE 
nodes from the vORAM and store them in temporary memory and then write back the entire path after any 
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Figure 7: HIRE insertion/deletion of X = (Hash(label), value); On the left is the HIRE without item X, displaying 
only the nodes along the search path for X, and on the right is the state of the HIRE with X inserted. Observe that the 
insertion operation (left to right) involves splitting the nodes below X in the HIRE, and the deletion operation (right 
to left) involves merging the nodes below X. 


update. However, properties of the HIRE tree enable better performance because the height of each HIRE 
tree element is uniquely determined, which means we can perform the updates on the way down in the search 
path. This only requires 2 HIRE tree nodes to be stored in local memory at any given time. 

Facilitating this extra efficiency requires considerable care in the implementation due to the nature of 
vORAM identifiers; namely, each infernal node musf be wriffen back fo vORAM before ifs children nodes 
are fefched. Fefching children nodes will change fheir vORAM idenlifiers and invalidate fhe poinfers in fhe 
parenf node. The solufion is fo pre-generate new random idenlifiers of fhe child nodes before Ihey are even 
accessed from fhe vORAM. 

The full defails of fhe HIRE operations can be found in Appendix E. 

HIRE tree properties. For our analysis of the HIRE tree, we first need to understand the distribution of 
items among each level in the HIRE tree. We assume a subroutine chooseheight(label) evaluates a random 
function on label to generate random coins, using which it samples from a truncated geometric distribution 
with maximum value H and probability (/3 — l)//3. 

Assumption 6. If labeli,..., labels are any n distinct labels stored in a HIRE, then the heights 

chooseheight(labeli),..., chooseheight(label„) 

are independent random samples from a truncated geometric distribution over {0, 1,... ,H} with proba¬ 
bility {/3 — l)//3, where the randomness is determined entirely by the the random oracle and the random 
function upon creation of the HIRE. 

In practice, the random coins for chooseheight(label) are prepared by computing coins = PRG(SHAl(seed|| label)), 
where seed is a global random seed, and PRG is a pseudorandom generator. With SHAl modeled as a ran¬ 
dom oracle, the coins will be pseudorandom. 

Theorem 7. The HIRE tree is a dictionary data structure that associates arbitrary labeh to valuer. If it 
contains n items, and has height H > log^n, and the nodes are stored in a vORAM, then the following 
properties hold: 

• The probability of failure in any operation is at most 2~^. 

• Each operation requires exactly 2H +1 node accesses, only 2 of which need to be stored in temporary 
memory at any given time. 

• The data structure itself, not counting the pointers, is strongly history independent. 
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The first property follows from the fact that the only way the HIRE tree can fail to work properly is if 
there is a hash collision. Based on the hash length defined above, the probability that any 2 keys collide 
amongst the n labels in the HIRE is at most 2“'*'. The second property follows from the description of 
the operations get, set, and delete, and is crucial not only for the performance of the HIRE but also for the 
obliviousness property. The third property is a consequence of the fact that the HIRE is uniquely represented 
up to the pointer values, after the hash function is chosen at initialization. 

vORAM+HIRB properties. We are now ready for the main theoretical results of the paper, which have to 
do with the performance and security guaranteed by the vORAM+HIRB construction. These proofs follow 
in a straightforward way from the results we have already stated on vORAM and on the HIRE, so we leave 
their proofs to Appendix B. 

Theorem 8. Suppose a HIRE tree with n items and height H is stored within a vORAM with L levels, bucket 
size Z, and stash size R. Given choices for Z and 7 > 0, set the parameters as follows: 

T > lg(4n + Ig n + 7 ) 

/3 = max{/3|Z > 20 • nodesize^} 
i? > 7 • nodesize^ 

H > log^ n 

Then the probability of failure due to stash overflow or collisions after each operation is at most 

Pr [vORAM-V HIRE failure] < 30 • (0.883)'’'. 

The parameters follow from the discussion above. Again note that the constants 30 and 0.883 are 
technical artifacts of the analysis. 

Theorem 9. Suppose a vORAM-vHlRE is constructed with parameters as above. The vORAM-vHlRE pro¬ 
vides obliviousness, secure deletion, and history independence with leakage o/0(n + nA/(logn)) opera¬ 
tions. 

The security properties follow from the previous results on the vORAM and the HIRE. Note that the 
HIRE structure itself provides history independence with no leakage, but when combined with the vORAM, 
the pointers may leak information about recent operations. The factor 0(log n) difference from the amount 
of leakage from vORAM in Theorem 4 arises because each HIRE operation entails O(logn) vORAM 
operations. Following the discussion after Theorem 4, we could also reduce the leakage in vORAM+HIRB 
to 0(n/ log^ n), with constant-factor increase in bandwidth, which again is optimal according to Theorem 1. 

6 Evaluation 

We completed two empirical analyses of the vORAM+HIRB system. First, we sought to determine the most 
effective size for vORAM buckets with respect to the expected block size, i.e., the ratio Z/B. Second, we 
made a complete implementation of the vORAM+HIRB and measured its performance in storing a realistic 
dataset of key/value pairs of 22MB in size. The complete source code of our implementation is available 
upon request. 

6.1 Optimizing vORAM parameters 

A crucial performance parameter in our vORAM construction is the ratio Z/B between the size Z of each 
bucket and the expected size B of each block. (Note that B = nodesize^ when storing HIRE nodes within 
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Figure 8; Maximum stash size, scaled by log n, observed across 50 simulations of a vORAM for various Z/B values. 


the vORAM.) This ratio is a constant factor in the bandwidth of every vORAM operation and has a consid¬ 
erable effect on performance. In the Path ORAM, the best corresponding theoretical ratio is 5, whereas it 
has been shown experimentally that a ratio of 4 will also work, even in the worst case [37]. 

We performed a similar experimental analysis of the ratio Z/B for the vORAM. Our best theoretical 
ratio from Theorem 5 is 20, but as in related work, the experimental performance is better. The goal is then 
to find the optimal, empirical choice for the ratio Z/B: If Z/B is too large, this will increase the overall 
communication cost of the vORAM, and if it is too small, there is a risk of stash overflow and loss of data 
or obliviousness. 

For the experiments described below, we implemented a vORAM structure without encryption and 
inserted a chosen number of variable size blocks whose sizes were randomly sampled from a geometric 
distribution with expected size 68 bytes. To avoid collisions, we ensured the identifier lengths satisfied 
7 > 40. 

Stash size. To analyze the stash size for different Z/B ratios, we ran a number of experiments and moni¬ 
tored the maximum stash size observed at any point throughout the experiment. Recall, while the stash will 
typically be empty after every operation, the max stash size should grow logarithmically with respect to the 
number of items inserted in the vORAM. The primary results are presented in Figure 8. 

This experiment was conducted by running 50 simulations of a vORAM with n insertions and a height 
of T = Ign. The Z/B value ranged from 1 to 50, and results in the range 1 through 12 are presented in the 
graph for values of n ranging from 10^ through 10^. The graph plots the ratio R/ Ig n, where R is the largest 
max stash size at any point in any of the 50 simulations. Observe that between Z/B = \ and Z/B = f) the 
ratio stabilizes for all values of n, indicating a maximum stash of approximately 100 Ig n. 

In order to measure how much stash would be needed in practice for much larger experimental runs, 
we fixed Z/B = fS and for three large database sizes, n = 2^®, 2^®, 2^°, For each size, we executed 2n 
operations, measuring the size of stash after each. In practice, as we would assume from the theoretical 
results, the stash size is almost always zero. However, the stash does occasionally become non-empty, and 
it is precisely the frequency and size of these rare events that we wish to measure. 

Figure 9 shows the result of our stash overflow experiment. We divided each test run of roughly 2n 
operations into roughly n overlapping windows of n operations each, and then for each window, and each 
possible stash size, calculated the number of operations before the first time that stash size was exceeded. 
The average number of operations until this occurred, over all n windows, is plotted in the graph. The data 
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Figure 9: Average time until stash overflow, for varying vORAM and stash sizes. Stash size is linear-scale, number 
of operations in log-scale. Higher is better. For each vORAM size n, we performed 2n operations to gather sufficient 
experimental data. 


shows a linear trend in log-scale, meaning that the stash size necessary to ensure low overflow probability 
after N operations is O(logA^), as expected. Furthermore, in all experiments we never witnessed a stash 
size larger than roughly 10KB, whereas the theoretical bound of 100 Ign items would be 16KB for the 
largest test with 8-byte items. 

Bucket utilization. Stash size is the most important parameter of vORAM, but it provides a limited view 
into the optimal bucket size ratio, in particular as the stash overflow is typically zero after every operation, 
for sufficiently large buckets. We measured the utilization of buckets at different levels of the vORAM with 
varied heights and Z/B values. The results are presented in Figure 10 and were collected by averaging 
the final bucket utilization from 10 simulations. The utilization at each level is measured by dividing the 
total storage capacity of the level by the number of bytes at the level. In all cases, n = 2^® elements were 
inserted, and the vORAM height varied between 14, 15, and 16. The graph shows that with height Ig n = 15 
or higher and Z/B is 6 or higher, utilization stabilizes throughout all the levels (with only a small spike at 
the leaf level). 

The results indicate, again, that when Z/B = 6, the utilization at the interior buckets stabilizes. With 
smaller ratios, e.g., Z/B = 4, the utilization of buckets higher in the tree dominates those lower in the tree; 
essentially, blocks are not able to reach lower levels resulting in higher stash sizes (see previous experiment). 
With larger ratios, which we measured all the way to Z/i? = 13, we observed consistent stabilization. 

In addition, our data shows that decreasing the number of levels from Ign to Ign — 1 (e.g., from 15 to 
14 in the figure) increases utilization at the leaf nodes as expected (as depicted in the spike in the tail of the 
graphs), but when Z/B > 6 the extra blocks in leaf nodes do not propagate up the tree and affect the stash. 
It therefore appears that in practice, the number of levels T could be set to Ig n — 1, which will result in 
a factor of 2 savings in the size of persistent (cloud) storage due to high utilization at the leaf nodes. This 
follows a similar observation about the height of the Path ORAM made by [37]. 

6.2 Measuring vORAM-i-HIRB Performance 

We measured the performance of our vORAM-i-HIRB implementation on a real data set of reasonable size, 
and compared to some alternative methods for storing a remote map data structure that provide varying 
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Figure 10: Utilization at different levels of ORAM 


levels of security and efficiency. All of our implementations used the same client/server setup, with a 
PythonS implementation and AWS as the cloud provider, in order to give a fair comparison. 

Sample dataset. We tested the performance of our implementation on a dataset of 300,000 synthetic 
key/value pairs where where keys were variable sizes (in the order of bytes 10-20 bytes) and values were 
fixed at 16 bytes. The total unencrypted data set is 22MB in size. In our experiments, we used some subset of 
this data dependent on the size of the ORAM, and for each size, we also assumed that the ORAM user would 
want to allow the database to grow. As such, we built the ORAM to double the size of the initialization. 

Optimized vORAM+HIRB implementation. We fully implemented our vORAM-i-HIRB map data struc¬ 
ture using Python3 and Amazon Web Services as the cloud service provider. We used AES256 for encryption 
in vORAM, and used SHAl to generate labels for the HIRB. In our setting, we considered a client running 
on the local machine that maintained the erasable memory, and the server (the cloud) provided the persistent 
storage with a simple get/set interface to store or retrieve a given (encrypted) vORAM bucket. 

For the vORAM buckets, we choose Z/B = 6 based on the prior experiments, and a bucket size of 4K, 
which is the preferred back-end transfer size for AWS, and was also the bucket size used by [4]. One of 
the advantages of the vORAM over other ORAMs is that the bucket size can be set to match the storage 
requirements with high bucket utilization. The settings for the HIRB were then selected based on Theorem 8 
and based on that, we calculated a /3 = 12 for the sample data (labels and values) stored within the HIRB. 
The label, value, and associated vORAM identifiers total 56 bytes per item. 

In our experiments, we found that the round complexity of protocols dominate performance and so we 
made a number of improvements and optimizations to the vORAM access routines to compensate. The 
result is an optimized version of the vORAM. In particular: 

• Parallelization: The optimized vORAM transfers buckets along a single path in parallel over simul¬ 
taneous connections for both the evict and writeback methods. Our experiments used up to T threads 
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in parallel to fetch and send ORAM block files, and each maintained a persistent sf tp connection. 

• Buffering: A local buffer storing 2T top-most ORAM buckets was used to facilitate asynchronous 
path reading and writing by our threads. Note the size of the client storage still remains with 0(log re) 
since T = O(logre). This had an added performance benefit beyond the parallelization because the 
top few levels of the ORAM generally resided in the buffer and did not need to be transferred back 
and forth to the cloud after every operation. 

These optimizations had a considerable effect on the performance. We did not include the cost of the 
PS 2 second setup/teardown time for these SSL connections in our results as these were a one-time cost 
incurred at initialization. Many similar techniques to these have been used in previous work to achieve 
similar performance gains (e.g., [25, 41]), although they have not been previously applied to oblivious data 
structures. 

Comparison baselines. We compared our optimized vORAM-i-HIRB construction with four other alter¬ 
native implementations of a remote map data structure, with a wide range of performance and security 
properties: 

• Un-optimized vORAM+HlRB. This is the same as our normal vORAM-i-HIRB construction, but with¬ 
out any buffering of vORAM buckets and with only a single concurrent sftp connection. This com¬ 
parison allows us to see what gains are due to the algorithmic improvements in vORAM and HIRB, 
and which are due to the network optimizations. 

• Naive Baseline: We implemented a naive approach that provides all three security properties, oblivi¬ 
ousness, secure deletion, and history independence. The method involves maintaining a single, fixed- 
size encrypfed file fransferred back and forfh befween fhe server and clienf and re-encrypfed on each 
access. While fhis solufion is cumbersome for large sizes, if is fhe obvious solufion for small dafabases 
and fhus provides a useful baseline. Furfhermore, we are nof aware of any ofher mefhod (ofher fhan 
vORAM-i-HIRB) fo provide obliviousness, secure-delefion, and history independence. 

• ORAM+AVL: We implemented fhe ODS proposed by [40] of an AVL embedded wifhin an non¬ 
recursive Pafh ORAM. Nofe fhaf ORAM-i-AVL does nof provide secure deletion nor hisfory inde¬ 
pendence. We used fhe same cryptographic seffings as our vORAM-i-HIRB implemenfafion, and used 
256 byfe blocks for each AVL node, which was fhe smallesf size we could achieve wifhouf addifional 
opfimizafions. As recommended by [37], we sfored Z = A fixed-size blocks in each buckef, for a fofal 
of IK buckef size. Nofe fhaf fhis buckef size is less fhan fhe 4K fransfer size recommended by fhe 
cloud sforage, which reflecls fhe limifafion of ORAM-i-AVL in fhaf if cannof effectively ufilize larger 
buckefs. We add fhe observafion fhaf, when fhe same experimenfs were run wifh 4K size buckefs 
(more wasfed bandwidfh, buf mafching fhe ofher experimenfs), fhe timings did nof change by more 
fhan 1 second, indicafing fhaf fhe 4K buckef size is a good choice for fhe AWS back-end. 

• SD-B-Tree: As anofher comparison poinf, we implemenfed a remofely sfored B-Tree wifh secure 
delefion where each node is encrypfed wifh a key sfored in fhe parenf wifh re-keying for each access, 
much as described by Reardon ef al. [34]. White fhis solufion provides secure delefion, and sfores all 
dafa encrypfed, if does nof provide obliviousness nor hisfory independence. Again, we used AES256 
encryption, wifh /3 = 110 for fhe B-free max infernal node size in order fo optimize 4K-size blocks. 

In terms of securify, only our vORAM-i-HIRB as well as fhe naive baseline provide obliviousness, secure 
delefion, and hisfory independence. The ORAM-i-AVL provides obliviousness only, and fhe SD-B-Tree is 
mosf vulnerable fo leaking informalion in fhe cloud, as if provides secure delefion only. 
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Figure 11: Median of 100 access times for different number of entries 


In terms of asymptotic performance, the SD-B-Tree is fastest, requiring only 0(log n) data transfer per 
operation. The vORAM+HIRB and ORAM+AVL both require 0(log^ n) data transfer per operation, al¬ 
though as discussed previously the vORAM-i-HIRB saves a considerable constant factor. The naive baseline 
requires 0{n) transfer per operation, albeit with the smallest possible constant factor. 

Experimental results. The primary result of the experiment is presented in Figure 11 where we compared 
the cost of a single access time (in seconds) across the back end storage (note, graph is log-log). Unsurpris¬ 
ingly, the SD-B-Tree implementation is fastest for sufficiently large database sizes. However, our optimized 
vORAM-i-HIRB implementation was competitive to the SD-B-Tree performance, both being less than 1 
second across our range of experiments. 

Most striking is the access time of ORAM-i-AVL compared to the vORAM-i-HIRB implementations. 
In both the optimized and un-optimized setting, the vORAM-i-HIRB is orders of magnitude faster than 
ORAM-i-AVL, 20X faster un-optimized and lOOX faster when optimized. Even for a relatively small number 
of entries such as 2^^, a single access of ORAM-i-AVL takes 35 seconds, while it only requires 1.3 seconds 
of un-optimized vORAM-i-HIRB and 0.2 second of an optimized implementation. It is not until 2^® entries 
that ORAM-I-AVL even outperforms the naive 0{n) baseline solution. 

As described previously, we attribute much of the speed to decreasing the round complexity. The HIRB 
tree requires much smaller height as compared to an AVL tree because each HIRB node contains /3 items on 
average as compared to just a single item for an AVL tree. Additionally, the HIRB’s height is fixed and does 
not require padding to achieve obliviousness. Each AVL operation entails 3 • 1.44 IgA^ ORAM operations 
as compared to just 2 log^ N vORAM operations for the HIRB. This difference in communication cost is 
easily observed in Table 1. Overall, we see that the storage and communication costs for vORAM-i-HIRB 
are not too much larger than that for a secure deletion B-tree, which does not provide any access pattern 
hiding as the oblivious alternatives do. 

(The values in this table were generated by considering the worst-case costs in all cases, for our actual 
implementations, but considering only a single operation. Note that, for constructions providing oblivious¬ 
ness, every operation must actually follow this worst case cost, and so the comparison is fair.) 

Put simply, the vORAM-i-HIRB and SD-B-Tree are the only implementations which can be considered 
practical for real data sizes, and the benefit of vORAM-i-HIRB is its considerable additional security guar¬ 
antees of oblivious and bounded history independence. 
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Size 2^° 


Total storage 

Bandwidth 

Rounds 

Naive baseline 

8.2 KB 

8.2 KB 

1 

Secure deletion B-tree 

36.9 KB 

12.3 KB 

2 

ORAM-tAVL 

8.4 MB 

4.0 MB 

968 

vORAM-i-HIRB 

127.0 KB 

102.4 KB 

3 

Size 2^5 


Total storage 

Bandwidth 

Rounds 

Naive baseline 

262.1 KB 

262.1 KB 

1 

Secure deletion B-tree 

1.1 MB 

20.5 KB 

3 

ORAM-tAVL 

268.4 MB 

8.6 MB 

2096 

vORAM-i-HIRB 

4.2 MB 

286.7 KB 

4 

Size 2^° 


Total storage 

Bandwidth 

Rounds 

Naive baseline 

8.4 MB 

8.4 MB 

1 

Secure deletion B-tree 

33.8 MB 

20.5 KB 

3 

ORAM-tAVL 

8.6 GB 

15.1 MB 

3675 

vORAM-i-HIRB 

134.2 MB 

553.0 KB 

5 

Size 2^5 


Total storage 

Bandwidth 

Rounds 

Naive baseline 

268.4 MB 

268.4 MB 

1 

Secure deletion B-tree 

1.1 GB 

28.7 KB 

4 

ORAM-tAVL 

274.9 GB 

23.2 MB 

5668 

vORAM+HIRB 

4.3 GB 

901.1 KB 

6 

Size 2^0 


Total storage 

Bandwidth 

Rounds 

Naive baseline 

8.6 GB 

8.6 GB 

1 

Secure deletion B-tree 

34.6 GB 

36.9 KB 

5 

ORAM-tAVL 

8.8 TB 

33.3 MB 

8122 

vORAM-i-HIRB 

137.4 GB 

1.5 MB 

8 


Table 1: Storage and communication cost comparisons. Total storage is the amount of space required for the server, 
and the bandwidth and rounds are counted per operation. Each stored item consists of a 4-byte label and 4-byte value. 
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7 Conclusion 

In this paper, we have shown a new secure cloud storage system combining the previously disjoint security 
properties of obliviousness, secure deletion, and history independence. This was accomplished by develop¬ 
ing a new variable block size ORAM, or vORAM, and a new history independent, randomized data structure 
(HIRE) to be stored within the vORAM. 

The theoretical performance of our vORAM-i-HIRB construction is competitive to existing systems 
which provide fewer security properties. Our implemented system is up to lOOX faster (w.r.t. access time) 
than current best oblivious map data structure (which provides no secure deletion or history independence) 
by Wang et al. (CCS 14), bringing our single-operation time for a reasonable-sized database (> 2^®) to less 
than 1 second per access. 

There much potential for future work in this area. For example, one could consider data structures that 
support a richer set of operations, such as range queries, while preserving obliviousness, secure deletion, and 
history independence. Additionally, the vORAM construction in itself may provide novel and exciting new 
analytic results for ORAMs generally by not requiring fixed bucket sizes. There is a potential to improve the 
overall utilization and communication cost compared to existing ORAM models that used fixed size blocks. 

Finally, while we have demonsfrafed the practicality in terms of overall per-operation speed, we did 
not consider some additional practical performance measures as investigated by [4], such as performing 
asynchronous operations and optimizing upload vs download rates. Developing an ODS map considering 
these concerns as well would be a useful direction for future work. 
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A vORAM Operation Details 

The full detail of the vORAM helper functions is provided in Eigure 12, and the three main operations are 
shown in Eigure 13. 

B HIRB Operation Details 

We described the HIRB data structure in Section 5. The full details of the different subroutines are provided 
in Eigures 14 and 15. 

All the HIRB tree operations depend on a subroutine HIRBpath, which given a label hash, HIRB root 
node identifier, and vORAM, generates tuples {i,vo,vi,cidi) corresponding to the search path for that 
label in the HIRB. In each tuple, I is the level of node vq, which is along the search path for the label. In the 
initial part of the search path, that is, before the given label hash is found, node vi is always nil, a dummy 
access used to preserve obliviousness. The value cid\^ is the pre-generated identifier of fhe new node fhaf 
will be inserted on fhe next level, for possible inclusion in one of fhe parenf nodes as a child poinfer. This 
pre-generalion is imporfanf, as discussed in Secfion 5, so fhaf only 2 nodes need fo be stored in local memory 
af any given lime. 

When fhe given label hash is found, fhe search pafh splifs into fwo below fhaf node, and nodes vq and vi 
will be fhe nodes on eifher side of fhaf hash label. Note fhaf in fhe acfual implemenfafion of HIRBpath, vq 
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idgen() 

1 

Choose T ^ {0,1}^^"''^. 

2 

return 1 r. 

\oc{id, t) 

1 

return the location of the node at level t along the path from the root to the leaf node identified by id. 

This is is simply the index indicated by the (f + 1) most significant bits of id. 

evict(i(i) 

1 

key rootkey t> rootkey : enc key for root bucket 

2 

B ^ empty list 

3 

for f = 0,1,... T do 

4 

remove bucket at \oc{id, t) from persistent storage 


decrypt it with key 

5 

Append all partial blocks in the bucket to the end of B 

6 

key •(— child key from bucket according to \oc{id, f + 1) 

7 

end for 

8 

for each partial block {id *, £, blk) in B do 

9 

if {id* blko) is in stash already 

10 

then replace with {id*,£o + £, blko blk) > merge 

11 

else Add {id* ,£, blk) to stash 

12 

end for 

writeback(id) 

1 

key nil 

2 

for f = T, T — 1,..., 0 do 

3 

{{id*,£,blk) S stash : \oc{id*,t) = \oc{id,t)} 


> VF is the partial blocks storable in the bucket 

4 

create empty bucket with new child key key 


(other child key remains the same) 

5 

while W is not empty and bucket is not full do 

6 

{id*,£, blk) ^ arbitrary element from W 

7 

{id*,£i, blki) <— largest partial block of the above, fitting 


in the bucket with blk = blko blki and \blki\ = £i. 

8 

Add {id*,£i,blki) to the bucket 

9 

II 

10 

then remove {id*, £, blk) from W and from stash 

11 

else replace {id*,£, blk) in stash with {id*,£ — fi, blko). 


[> split a partial block 

12 

end while 

13 

key •(— {0,1}^ chosen uniformly at random 

14 

insert Encfeey (bucket) at \oc{id, t) in persistent storage. 

15 

end for 

16 

rootkey <— key 


Figure 12: vORAM helper functions 


28 













Pre-Print version 2015-11-23 


insert(6/fc) 

1: ido •(— idgen() 

2 : evict(i(io) 

3: id~^ -ir- idgen() 

4: insert \blk\,blk) into stash 
5: writeback(ido) 

6 : return 

remove(id) 

1: evict(i(i) 

2: remove {id, £, blk) from stash 
3: writeback(id) 

4: return blk 

update(id, callback) 

1: evict(i(i) 

2: remove {id, £, blk) from stash 
3: ^ idgen() 

4: blk'^ -(r- callback{blk) 

5: insert (ic?+, into stash 

6: writeback(id) 

7: return 


Figure 13: vORAM operations 


(resp. t;i, if defined) corresponds to a vORAM block, evicted with identifier ido (resp. idi) and taken out 
from vORAM stash. When each tuple {£, vo,vi, cidf) is returned from the generator, the two nodes can be 
modified by the calling function, and the modified nodes will be written back to the HIRE. If vi is returned 
from HIRBpath as nil, but is then modified to be a normal HIRE node, that new node is subsequently inserted 
into the HIRE. 

The update operation simply looks in each returned vq along the search path for the existence of the 
indicated label hash, and if found, the corresponding data value is passed to the callback function, possibly 
modifying it. 

As with update, the insert operation uses subroutine HIRBpath as a generator to traverse the HIRE tree. 
Inserting an element from the HIRE involves splitting nodes along the search path from the height of the 
item down to the leaf. That is, for each tuple {i, vo,vi) with I > £/j, where is the height of the label hash 
/i, if is nil, then a new node ni is created, and the items in vq with a label greater than h are moved to a 
new node vi. 

The remove operation works similarly, but instead of splitting each vq below the level of the found item, 
the values in no and vi are merged into no, and ni is removed by setting it to nil. 

C Proofs of Important Theorems 

Complete proofs of our main theorems are given here. 
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HIRBpath(/i, rootid, M) 

0 M is vORAM 

1 

{ido, idp ) ^ {rootid, M.idgen()) 


2 

rootid ^ id^ 


3 

{idi, id'^) (M.idgen(), M.idgen()) 

[> dummy access 

4 

found ^ false 


5 

for£ = 0,l,2,...,iFdo 


6 

M.evict(ido) 


7 

M.evict(idi) 


8 

ifi = H then {cidg, cidf) ^ (nil, nil) 


9 

else {cid^, cidf) ^ (M.idgen(), M.idgen()) 


10 

remove {ido, |t'o|jt^o) from M.stash 


11 

if found = true then 


12 

remove {idi, |wi|,t;i) from M.stash 


13 

{cido, t;o-childiast) ^ (t;o-child;asi, cid^) 


14 

{cidi, t;i.childo) ^ (t;i.childo, cidf) 

> vi is right next to vo at level i 

15 

else 


16 

vi ^ nil 

> only fetched after the target is found. 

17 

i •(— index of h in vq 

[> vo-h^-i < h < vo-hi 

18 

{cido, "fo-childi) -s— (riQ.childi, cid^) 


19 

if Vo-hi = h then 


20 

found ^ true 


21 

{cidi, z;o-childi+i) ^ ('yo-chUdi+i) cidi ) 

> split path: cido = uo-childi, cidi = uo-childi+i 

22 

else 


23 

cidi ^ M.idgen() 

> dummy access until found 

24 

end if 


25 

end if 


26 

yield {£, vo, vi,cidf) 

> Return to the caller, who may modify nodes. 

27 

insert {id^ , ,t^o) into M.stash 


28 

if t;i ^ nil then insert into M.stash 


29 

M.writeback(ido) 


30 

M.writeback(idi) 


31 

{ido, id^) ^ {cido, cid^) 


32 

{idi, id^) ^ {cidi, cid'^) 


33 

end for 



Figure 14: Fetching the nodes along a search path in the HIRE 
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hirbinit(i?, M) 

1 

rootid ^ nil 

2 

salt -(r- {0,1}^. Initialize Hash with salt. 

3 

iorl = H,H-l,...,QAo 

4 

node new 1-ary HIRE node with child id rootid 

5 

rootid ^ M.insert(no(ie) 

6 

end for 

7 

return rootid 

chooseheight(label) 

1 

h ^ Hash(label) 

2 

Choose coins (cq, Ci,..., ch-i) & {0,1,..., /3 — 1}-^ by evaluating PRG(/i). 

3 

return The largest integer i G {0,1,..., i/} such that ci = C 2 = • • • = Cf = 0. 

insert(label, value, rootid, M) 

1 

{h, ih) G- (Hash(label), chooseheight(label)) 

2 

for {i,vo,vi,cidi) G HIRBpath(h, rootzc?, M) do 

3 

i G- index of h in uq [> vo.hi-i < h < VQ.hi 

4 

if vq. hi = h then 

5 

Uq. value,; ^ value 

6 

else ifi = lh then 

7 

Insert [h, value, cid^ ) before {vo-hi, uq. valuer, uo-childi) 

> Other items in vq are shifted over 

8 

else it £ > ih and vi = nil then 

9 

v\ G- new node with ui.childg G- cid^ 

10 

Move items in vg past index i into vi 

11 

end if 

12 

end for 

remove(label, rootid, M) 

1 

(h, ih) G- (Hash(label), chooseheight(label)) 

2 

for {£,vg,vi,cidi) G HIRBpath(label, roofid, M) do 

3 

ithGvo then 

4 

Remove h and its associated value and subtree from vq 

5 

else it i > ih and vi ^ nil then 

6 

Add all items in vi except ui-childg to ug 

7 

vi G- nil 

8 

end if 

9 

end for 

update(label, callback, rootid, M) 

1 

h G- Hash(label) 

2 

for {i,vo,vi) G H1 RBpath (/i, rooftd, M) do 

3 

i G- index of h in vq 

4 

if VQ.hi = h then ug.valuer ■<— callback{vQ.vsi\uei) 

5 

end for 


Figure 15: Description of HIRE tree operations. 
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C.l Proof of Theorem 1 

Let V be any system that stores blocks of data in persistent storage and erasable memory and supports insert 
and remove operations, accessing at most k bytes in persistent or local storage in each insert or remove 
operation. 

Let n > 36 and k < -v/n/2. For any £ < n/{4:k), we describe a PPT adversary A = (^ 1 ,^ 2 ) that 
breaks history independence with leakage of £ operations. 

Supposing all operations are insertions, V must access the location where that item’s data is actually to 
be stored during execution of the insert operation, which is required to correctly store the data somehow. 
However, it may access some other locations as well to “hide” the access pattern from a potential attacker. 
This hiding is limited of course by k, which we will now exploit. 

The “chooser”, ^ 1 , randomly chooses n items which will be inserted; these could simply be random 
bit strings of equal length. Call these items (and their arbitrary order) oi, 02 ,..., a^- The chooser also 
randomly picks an index j G {l,2,...,n — £ — 1} from the beginning of the sequence. The operation 
sequence returned by consists of n insertion operations for oi,..., a„ in order: 

(Xi, . . . , , Clj , ttj-j-i, . . . , , Clfi—£^ (Z72—£+1 ? ■ ■ ■ ; 

whereas the second operation sequence o^^^^ returned by ^1 contains the same n insertions, with only the 
order of the j’th and (n — ^)’th insertions swapped: 

tti, . . . , ttjr'—1, (Xn—£i . . . , dj^ ■ 7 ^n* 

The adversary Ai includes the complete list of ai up to an, along with the distinguished index j, in the ST 
which is passed to A 2 - As the last £ operations are identical (insertion of items an-£+i up to a^), Ai is 
^admissible. 

The “guesser“, A 2 , looks back in the last {£ + l)k entries in the access pattern history of persistent 
storage act, and tries to opportunistically decrypt the data in each access entry using the keys from V.em 
(and, recursively, any other decryption keys which are found from decrypting data in the access pattern 
history). Some of the data may be unrecoverable, but at least the £-\-l items which were inserted in the last 
£ -\-l operations must be present in the decryptions, since their data must be recoverable using the erasable 
memory. Then the guesser simply looks to see whether aj is present in the decryptions; if aj is present then 
A 2 returns 1, otherwise if aj is not present then A 2 returns 0. 

In the experiment EXP^^''^'(P, A, n, 1,1), aj must be among the decrypted values in the last {£ + l)k 
access entries, since aj was inserted within the last £-\-l operations and each operation is allowed to trigger 
at most k operations on the persistent storage. Therefore Pr[EXP^^''^'(2?, A, n, 1,1) = 1] = 1. 

In the experiment EXP^^'"^' {V, A, n, 1,0), we know that each item Un-t, ■ ■ ■ ,an must be present in the 
decryptions, and there can be at most {£ + l)(/c — 1) other items in the decryptions. Since the index j 
was chosen randomly from among the first n — £ — I items in the list, the probability that aj is among the 
decrypted items in this case is at most 

n — £ — 1 

From the restriction that £ < n/{Ak), and k < y^/2 < n/12, we have 

{£ + l){k - 1) < {£ + l)k = £k + k < I + ^ = ^. 

In addition, we have n — £ — 1 > n/2, so the probability that aj is among the decrypted items is at most |, 
and we have Pr[EXP^^''^'(D, A, n, 1,0) = 1] < 2/3, and therefore AdvJ)((D, A, n) > 1/3. According to 
the definition, this means that V does not provide history independence with leakage of £ operations. 
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C.2 Proof of Theorem 5 

Our proofs on the distribution of block sizes in the ORAM and on the number of HIRE nodes depend on the 
following bound on the sum of geometric random variables. This is a standard type of result along the lines 
of Lemma 6 in [40]. 

Lemma 10. Let X = Yli<i<n ofn>l independent random variables Xi, each stochasti¬ 

cally dominated by a geometric distribution over {0,1,2,...} with expected value E[Aij] < p. Then there 
exists a constant cq > 0 whose value depends only on p such that, for any a > 2 and b > 0, we have 

Pr[A > (/U + l)(an + b)] < exp(—co(ore + b)). 

Proof. By linearity of expectation, E[^] = Z]ie[n] < np. 

Recall that a geometric random variable with expected value p is equivalent to the number of indepen¬ 
dent Bernoulli trials, each with probability p = l/{p-\-l), before the first success. If Ai > (/r + 1) (an + b), 
this is equivalent to having fewer than n successes over k = {p -\- l){an + b) independent Bernoulli trials 
with probability p. 

Using this formulation, we can apply the Hoeffding inequality to obtain 

Pr[X > k] = Pr[Binomial(A:,p) < n — 1] < exp(—2e^A:), 
where e is defined such fhaf n — 1 = (p — e)fc; namely 


We do some manipulafion: 


e = p 


n—1 

k 


1 

/T+l 


n—1 

k • 


2e^k 


2k ( 1 (n-l)(/i-|-l) 

(T+TF ■ V k 

2(an+b) / 1 _ n-1 

/4+1 an-\-b J 


2 


Because a > 2 and 6 > 0, we have 

n—1 ^ n_ ^ 1 
an-\-h an — 2 ’ 

and so 

eyip{-2e^k) < exp (- 2 p^(an + b)) • 
The sfafed resulf follows wifh fhe consfanf defined by 


Co 


1 


( 1 ) 


Outline of proof of Theorem 5. We will mosfly follow fhe proof of fhe small-slash-size Iheorem in Palh 
ORAM [37]. The proof of fhe Iheorem consisls of several sleps. 

1. We recall fhe definilion of oo-ORAM (ORAM wifh infinilely large buckels) and show lhal slash usage 
in an oo-ORAM wifh posl-processing is fhe same as fhaf in fhe aclual vORAM. 

2. We rely on resulls from fhe mosl recenl version of [38] fo show lhal Ihe slash usage after posl- 
processing is greater lhan R if and only if Ihere exisls a sublree for which ils usage in oo-ORAM 
is more lhan ils capacity. 
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3. We bound the total size of all blocks in any such subtree by combining two separate measure con¬ 
centrations on the number of blocks in any such subtree, and the total size of any fixed number of 
variable-length blocks. 

4. We complete the proof by connecting the measure concentrations to the actual stash size, in a similar 
way to [38]. 

Note that the first and third steps are those that differ most substantially from prior work, and where we 
must incorporate the unique properties of the vORAM. 

Proof of Theorem 5. We now give the proof. 

Step 1: oo-ORAM. The oo-ORAM is the same as our vORAM tree, except that each bucket has infinite 
size. In any writeback operation all blocks will go as far down along the path as their identifier allows. 

Affer simulafing a series of vORAM operafions on fhe oo-ORAM, we perform a greedy posf-processing 
fo resfore fhe block size condition: 

• Repeafedly selecf a buckef sforing more fhan Z byfes. Remove a partial block from fhe buckef, and 
lef b be fhe number of remaining byfes sfored in fhe buckef. 

• If Z — 6 is greafer fhan fhe size of mefadafa per parfial block (idenfifier and lengfh), fhen fhere is some 
room lefl in fhe buckef. In fhis case, splif fhe removed block info fwo parfs. Place fhe lasf Z — b byfes 
info fhe currenf buckef and fhe remainder info fhe parenf buckef. Ofherwise, if fhere is insufficienf 
room in fhe buckef, place fhe enfire block info fhe parenf buckef, or info fhe sfash if fhe currenf buckef 
is fhe roof. 

By confinuing fhis process until fhere are no remaining buckefs wifh greafer fhan Z byfes, we have refurned 
fo a normal vORAM wifh buckef size Z. Furfhermore, fhere is an ordering of fhe accesses, wifh fhe same 
identifiers and block lengfhs, fhaf would resulf in fhe same vORAM. Since fhe access order of fhe cx)-ORAM 
does nof maffer, fhis shows fhaf fhe fwo models are equivalenf affer posf-processing. 

Observe fhaf we are ignoring fhe mefadafa (block identifiers and lengfh sfrings). This is accepfable, as 
fhe removal process in fhe acfual vORAM always ensures fhaf each parfial block of a given block, excepf 
possibly for fhe firsf (highesf in fhe vORAM free), has size af leasf equal fo fhe size of ifs mefadafa. In fhaf 
way, af mosf half fhe vORAM is used for mefadafa storage, and so fhe mefadafa has only a consfanf facfor 
effecl on fhe overall performance. 

Step 2: Overflowing subtrees. Consider fhe size of vORAM sfash affer any series of vORAM operafions 
fhaf resulf in a fofal of af mosf n blocks being sfored. Similarly fo [38, Lemma 2], fhe sfash size af fhis poinf 
is equal fo fhe fofal overflow from some subfree of fhe oo-ORAM buckefs fhaf confains fhe roof. If we wrife 
r for fhaf subfree, fhen we have 

\stash\ > BR iff 

Z]node'uer(si^® of V in oo-ORAM) > Z\t\ -f BR. 

Step 3: Size of subtrees. We prove a bound on fhe fofal size of all blocks in any subfree r in fhe oo-ORAM 
in fwo sfeps. Firsf we bound fhe number of blocks in fhe subfree, which can use fhe same analysis as fhe Pafh 
ORAM; fhen we bound fhe fofal size of a given number of variable-lengfh blocks; and, finally, we combine 
fhese wifh a union bound argumenf. 

To bound fhe fofal number of blocks fhaf occur in r, because fhe block sizes do nof maffer in fhe oo- 
ORAM, we can simply recall from [38, Lemma 5] fhaf, for any subfree r, fhe probabilify fhaf r confains 
more fhan 5|r| -|- i?/4 blocks is af mosf 

^ • (0.9332)1^1 • (0.881)^. (2) 


34 


Pre-Print version 2015-11-23 


Next we consider the total size of 5|r| +i?/4 variable-length blocks. From the statement of the theorem, 
each block size is stochastically dominated by BX, where B is the expected block size and X is a geometric 
random variable with expected value /r = 1. From Lemma 10, the total size of all 5|r| -|-i?/4 blocks exceeds 
2(a(5|T| -|- R/A))B with probability at most 

exp (-coa (5|r| -|- 42/4)). 

From (1), we can take cq = 1/4, and by setting a = 2 > (4/5) In 4, the probability that the total size of 
5|t| -|- 42/4 blocks exceeds (20|r| -|- 42)44 is at most exp (—§|t| — |42), which in turn is less than 

^ • (0.329)1^1 • (0.883)^. (3) 

Finally, by the union bound, the probability that the total size of all blocks in r exceeds (20|t| -|- 42)44 is 
at most the sum of the probabilities in (2) and (3), which is less than 

^ • (0.9332)1^1 • (0.883)^. (4) 

Step 4: Stash overflow probability. As in [38, Section 5.2], the number of subtrees of size i is less than 
4*, and therefore by another application of the union bound along with (4), the probability of any subtree r 
having total block size greater than (20|r| + 42)44 is at most 

^4*^ • (0.9332)* • (0.883)^ 

i>l 

< 28 • (0.883)^. 


C.3 Proof of Theorem 8 

We now utilize Lemma 10 to prove the two lemmata on the distributions of the number and size of HIRE 
tree nodes. 

Lemma 11. Suppose a HIRE tree with n items has height 44 > log^ n, and let X be the total number of 
nodes in the HIRE, which is a random variable over the choice of hash function in initializing the HIRE. 
Then for any m>l, we have 

Pr [X > 44 + 4n + m] < 0.883”*. 

In other words, the number of HIRE nodes in storage at any given time is 0{n) with high probability. 
The proof is a fairly standard application of the Hoeffding inequality [18]. 

Proof. The HIRE has 44 nodes initially. Consider the n items labeli,..., labels in the HIRE. Eecause the 
tree is uniquely represented, we can consider the number of nodes after inserting the items in any particular 
order. 

When inserting an item with label* into the HIRE, its height h = chooseheight(labelj) is computed from 
the label hash, where 0 < h < H, and then exactly h existing HIRE nodes are split when label* is inserted, 
resulting in exactly h newly created nodes. 

Therefore the total number of nodes in the HIRE after inserting all n items is exactly 44 plus the sum 
of the heights of all items in the HIRE, which from Assumption 6 is the sum of n iid geometric random 
variables, each with expected value l/(/3 — 1). Call this sum Y. 
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We are interested in bounding the probability that Y exceeds 4n + m. Writing /r = l/(/3 — 1) for the 
expected value of each r.v., we have // + 1 = /3/(/3 — 1), which is at most 2 since f3 > 1. This means that 
4n + m > (^ + l)(2n + m/2), and from Lemma 10, 

Pr[X > Lf + 4n + m] = Pr[y > 4n + m] 

< Pr[y > (/r + l)(2n + m/2)] 

< exp(—co(2n + m/2)) 

< exp(—com/2). 

Because /r + 1 < 2, cq = l/(2(;^ + 1)) > 1/4. Numerical computation confirms that exp(—1/8) < 
0.883, which completes the proof. ■ 

Along with the bound above on the number of HIRE nodes, we also need a bound on the size of each 
node. 

Lemma 12. Suppose a HIRE tree with n items has height H > log^ n, and let X, a random variable over 
the choice of hash function, be the size of an arbitrary node in the HIRE. Then for any m>l, we have 

Pr[X > m • nodesize/j] < 0.5™. 

The proof of this lemma works by first bounding the probability that the number of items in any node is 
at most m/3 and applies the formula for node size, i.e., 

nodesizefc = 

{k + 1)(2T + 7 + 1) + /c(| Hash (label) I + | value]). 

Proof We first show that the probability that any node’s branching factor is more than m/3 is at most 0.5™. 
This first part requires a special case for the root node, and a general case for any other node. Then we show 
that any node with branching factor at most m/3 has size less than m • nodesize^j. 

First consider the items in the root node. These items all have height H, which according to Assump¬ 
tion 6 occurs for any given label with probability l//3^. Therefore the number of items in the root node 
follows a binomial distribution with parameter l//3^. It is a standard result (for example. Theorem C.2 in 
[9]) that a sample from such a distribution is at least k with probability at most 

/ n\ 1 ^ rf 

\k)^^ 

From the assumption H > log^n, < (5^^, so the bound above becomes simply Setting 

k = m/3, the probability that the root node has at least k items and hence branching factor greater than m/3, 
is seen to be at most which is always at most 2“™ because m > 1 and /3 > 2.. 

Next consider any nonempty HIRE tree node at height I, and consider a hypothetically infinite list of 
possible label hashes from the HIRE which have height at least £ and could be in this node. The actual 
number of items is determined by the number of those labels whose height is exactly equal to I before 
we find one whose heighf is af leasf £ -\- 1. From Assumpfion 6, and fhe memorylessness properfy of fhe 
geomefric disfribufion, fhese label heights are independent Bernoulli trials, and each height equals £ with 
probability (/3 - l)//3. 

Therefore the size of each non-root node is a geometric random variable over {0,1,...} with parameter 
1//3. The probability that the node contains at least m/3 items, and therefore has banching factor greater 
than m/3, is exactly 

< exp(—m) < 0.5™. 
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Here we use the fact that (1 — < exp(—a) for any x > 1 and any real a. 

All that remains is to say that a node with branching factor m/3 has size less than m ■ nodesize^, which 
follows directly from m > 1 and the definition of nodesize /3 in (5). ■ 

Finally, we prove the main theorems on the vORAM+HIRB performance and security. 

Proof of Theorem 8. We step through and motivate the choices of parameters, one by one. 

The expected branching factor /3 must be at least 2 for the HIRE to work, which means we must always 
have H < Ign, and so T = lg(4n + Ign + 7 ) < lg(4n + Ff + 7 ). Then Lemma 11 guarantees that the 
number of HIRE nodes is less than FF + 4n + 7 with probability at least (0.883)''". This means that T is an 
admissible height for the vORAM according to Theorem 5 with at least that probability. 

The choice of /3 is such that Z > 20 • nodesize^j, using the inequality 

H < Ign < lg(4n) < T. 

Therefore, by Lemma 12, the size of blocks in the HIRE will be admissible for the vORAM according to 
Theorem 5. 

This allows us to say from the choice of R and Theorem 5 that the probability of stash overflow is at 
most 28 • (0.883)'^. 

Choosing FF as we do is required to actually apply Lemmas 11 and 12 above. 

Finally, the probability of two label hashes in the HIRE colliding is at most 2“'>'. The stated result 
follows from the union bound over the three failure probabilities. 
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